Finding Optimal Multi-Splits for Numerical Attributes in Decision Tree Learning
نویسندگان
چکیده
Handling continuous attribute ranges remains a deeciency of top-down induction of decision trees. They require special treatment and do not t the learning scheme as well as one could hope for. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. This topic has attracted abundant attention in recent years. In particular , Fayyad and Irani showed how optimal binary partitions can be found eeciently. Later, they based a greedy heuristic multipartitioning algorithm on these results. Recently, Fulton, Kasif, and Salzberg attempted to develop algorithms for nding the optimal multi-split for a numerical attribute in one phase. We prove that, similarly as in the binary partitioning, only boundary points need to be inspected in order to nd the optimal multipartition of a numerical value range. We develop eecient algorithms for nding the optimal splitting into more than two intervals. The resulting partition is guaranteed to be optimal w.r.t. the function that is used to evaluate the attributes' utility in class prediction. We contrast our method with alternative approaches in initial empirical experiments. They show that the new method surpasses the greedy heuristic approach of Fayyad and Irani constantly in the goodness of the produced multi-split, but, with small data sets, cannot quite attain the ef-ciency of the greedy approach. Furthermore, our experiments reveal that one of the techniques proposed by Fulton, Kasif, and Salzberg is of scarce use in practical tasks, since its time consumption falls short of all demands. In addition, it categorically fails in nding the optimal multi-split because of an error in the rationale of the method.
منابع مشابه
Building multi-way decision trees with numerical attributes
Decision trees are probably the most popular and commonly used classification model. They are recursively built following a top-down approach (from general concepts to particular examples) by repeated splits of the training dataset. When this dataset contains numerical attributes, binary splits are usually performed by choosing the threshold value which minimizes the impurity measure used as sp...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملImproved Use of Continuous Attributes in C 4
A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes. An MDL-inspired penalty is applied to such tests, eliminating some of them from consideration and altering the relative desirability of all tests. Empirical trials show that the modiications lead to smaller decision trees with higher predict...
متن کامل